{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 第2章-感知机模型-算法收敛性\n",
    "### 定理2.1 Novikoff\n",
    "&emsp;&emsp;假设训练数据集$T=\\{(x_1,y_1),(x_2,y_2),\\ldots, (x_N,y_N)\\}$是线性可分的，其中$x_i \\in \\mathcal{X} = \\mathrm{R}^n,y_i \\in \\mathcal{Y} = \\{-1,+1\\}, i=1,2,\\cdots,N$则  \n",
    "（1）存在满足条件$\\|\\hat{w}_{opt}\\|=1$的超平面$\\hat{w}_{opt} \\cdot \\hat{x}=w_{opt} \\cdot x + b_{opt}=0$将训练数据集完全正确分开；且存在$\\gamma > 0$，对所有$i=1,2,\\cdots,N$满足$$y_i(\\hat{w}_{opt} \\cdot \\hat{x})=y_i(w_{opt} \\cdot x_i + b_{opt}) \\geqslant \\gamma$$\n",
    "（2）令$R=\\max_{1 \\leqslant i \\leqslant N} \\|\\hat{x}_i\\|$，则感知机算法2.1在训练数据集上的误分类次数$k$满足下面不等式$$k \\leqslant \\left(\\frac{R}{\\gamma} \\right)^2$$\n",
    "### 解释   \n",
    "超平面$w_{opt} \\cdot x + b_{opt}=0$为可以将训练数据集线性可分的。  \n",
    "先定义$\\hat{w}_{opt}=(w^T_{opt},b_{opt})^T$，$\\hat{x}=(x^T,1)^T$  \n",
    "再计算$\\hat{w}_{opt}$和$\\hat{x}$的内积（对应部分相乘再求和）：$\\hat{w}_{opt} \\cdot \\hat{x}=w_{opt} \\cdot x+b_{opt}$  \n",
    "所以超平面$w_{opt} \\cdot x + b_{opt}=0$可以用$\\hat{w}_{opt} \\cdot \\hat{x}=0$来做等价变换。  \n",
    "为了让$\\hat{w}_{opt}$表示唯一，对其进行约束，使得其长度等于单位长度：$\\|\\hat{w}_{opt}\\|=1$  \n",
    "\n",
    "### 证明 \n",
    "#### 证明（1）\n",
    "可知对于任意超平面都有$y_{i}(\\hat{w}_{opt} \\cdot \\hat{x}_{i})>0$，所以存在$\\gamma=\\min _{i} \\{y_{i}(\\hat{w}_{opt}, x_{i})\\}$，于是得证。  \n",
    "\n",
    "#### 证明（2）\n",
    "&emsp;&emsp;根据这个算法，每找到一个点，都会对$w$进行修正，修正的总次数$k$值一定是有上界的，要得到这个结论，需要两个步骤。  \n",
    "**第一步：** 需要证明$\\hat{w}_{k} \\cdot \\hat{w}_{opt} \\geqslant k \\eta \\gamma$，其中$\\eta$就是算法中的学习率，$\\hat{w}_{k}$就是在第$k$个误分类点修正之后所得到的。假设$\\hat{w}_0=(0,0,\\cdots,0)^T$  \n",
    "$$\\begin{aligned}\\hat{w}_{k} \\cdot \\hat{w}_{opt} &=(\\hat{w}_{k-1}+\\eta y_{i} \\hat{x}_{i}) \\cdot \\hat{w}_{opt} \\\\\n",
    "&=\\hat{w}_{k-1} \\cdot \\hat{w}_{opt}+\\eta y_{i} \\hat{x}_{i} \\cdot \\hat{w}_{opt} \\\\\n",
    "& \\geqslant \\hat{w}_{k-1} \\cdot \\hat{w}_{opt} + \\eta \\gamma \\\\\n",
    "& \\geqslant \\hat{w}_{k-2} \\cdot \\hat{w}_{opt} + \\eta \\gamma + \\eta \\gamma \\\\\n",
    "& \\cdots \\\\\n",
    "& \\geqslant \\hat{w}_0 \\cdot \\hat{w}_{opt} + k \\eta \\gamma \\\\\n",
    "& = k \\eta \\gamma\n",
    "\\end{aligned}$$\n",
    "第一步得证。  \n",
    "**第二步：** 需要证明$\\|\\hat{w}_k\\|^2$有上界的，小于某个值。  \n",
    "二范数的平方裂项：$$\\begin{aligned} \\|a+b\\|^2\n",
    "&= (a+b)^T(a+b) \\\\\n",
    "&= \\|a\\|^2 + 2a \\cdot b + \\|b\\|^2\\\\\n",
    "\\end{aligned}$$\n",
    "开始证明：$$\\begin{aligned} \\|\\hat{w}_k\\|^2\n",
    "&= \\|\\hat{w}_{k-1} + \\eta y_i \\hat{x}_i\\|^2 \\\\\n",
    "&= \\|\\hat{w}_{k-1}\\|^2 + 2\\eta y_i \\hat{w}_{k-1} \\cdot \\hat{x}_i + \\eta^2\\|\\hat{x}_i\\|^2\n",
    "\\end{aligned}$$\n",
    "&emsp;&emsp;上式中间一项，由于是误分类点，可得$2\\eta y_i \\hat{w}_{k-1} \\cdot \\hat{x}_i < 0$，又由于$\\|\\hat{x}_i\\| \\leqslant R$，所以$\\|\\hat{w}_{k-1}\\|^2 + 2\\eta y_i \\hat{w}_{k-1} \\cdot \\hat{x}_i + \\eta^2\\|\\hat{x}_i\\|^2 \\leqslant \\|\\hat{w}_{k-1}\\|^2 + \\eta^2 R^2$，根据递归，最终可以得到：$$\\begin{aligned} \\|\\hat{w}_k\\|^2 \n",
    "& \\leqslant \\|\\hat{w}_{k-1}\\|^2 + \\eta^2 R^2 \\\\\n",
    "& \\leqslant \\cdots \\\\\n",
    "& \\leqslant \\|\\hat{w}_0\\|^2 + k\\eta^2 R^2 \\\\\n",
    "&= k\\eta^2 R^2\n",
    "\\end{aligned}$$\n",
    "第二步得证。  \n",
    "\n",
    "-----\n",
    "现根据上面两步继续推导：根据柯西不等式，得到$\\hat{w}_k \\cdot \\hat{w}_{opt} \\leqslant \\|\\hat{w}_k\\| \\cdot \\|\\hat{w}_{opt}\\|$  \n",
    "$\\because \\|\\hat{w}_{opt}\\|=1$  \n",
    "$\\therefore \\hat{w}_k \\cdot \\hat{w}_{opt} \\leqslant \\|\\hat{w}_k\\|$  \n",
    "由第二步可知$\\|\\hat{w}_k\\| \\leqslant \\sqrt{k} \\eta R$  \n",
    "由第一步可知$\\hat{w}_{k} \\cdot \\hat{w}_{opt} \\geqslant k \\eta \\gamma$  \n",
    "$ \\therefore k \\eta \\gamma \\leqslant \\hat{w}_{k} \\cdot \\hat{w}_{opt} \\leqslant  \\|\\hat{w}_k\\| \\leqslant \\sqrt{k} \\eta R$  \n",
    "$ \\therefore k \\eta \\gamma \\leqslant \\sqrt{k \\eta^2 R^2}$  \n",
    "$ \\therefore k \\leqslant \\left(\\frac{R}{\\gamma}\\right)^2$  \n",
    "&emsp;&emsp;故结论二得证，也就是说误分类次数$k$是有上界的，随机梯度下降法在有限步之后就可以使得所有的训练集中的实例都正确分类。  \n",
    "\n",
    "### 思考\n",
    "现考虑在结论二的证明中得到的两步结论：  \n",
    "1. $\\hat{w}_{k} \\cdot \\hat{w}_{opt} \\geqslant k \\eta \\gamma$\n",
    "2. $\\|\\hat{w}_k\\|^2 \\leqslant k\\eta^2 R^2$\n",
    "\n",
    "$\\hat{w}_{k} \\cdot \\hat{w}_{opt}$内积是逐渐在增大的，因为$k$是逐渐增大的，$\\hat{w}_k$的长度是可以被限制住的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
